What is this Analysis?

This notebook is for examining the current ratings on boardgamegeek.com with the aim of adjusting and computing different versions of these ratings. The main goal is to compute alternative ranks to adjust for a tendency of the BGG ratings to skew towards complex and recently released games. This is not a novel contribution, but this notebook aims to explain some of the considerations behind calculating different ratings for games as well as refresh these ratings frequently off by scraping the most up to data from the BGG API.

1 Examining BGG Ratings

We’ll load in the active data we have from BGG and begin our analysis. There are a couple of different variables we’re interested in exploring: the average rating, the geek rating (bayesaverage), and the average weight, or complexity. For the average and geek ratings, votes can range from 1 to 10, while complexity ranges from 1 to 5.

If we plot average rating of games vs their complexity, we can see that a game’s rating is heavily correlated with its complexity.

We can also plot this with labels to pick out particular games.

The average rating is also a function of time, as we see a general rise in the average for more recently released games.

If we size games by their number of user ratings, we can see when we start to really see the number of user ratings for games take off.

1.1 What is the Geek Rating?

Now, so far we’ve just looked at the average rating, which is simply the average of community ratings for a given game. Most people care about the Geek ratings, uses a combination of the community average and the number of votes. In order to get a high Geek rating, a game need to both be well rated (a high average) and have a high enough number of user ratings. We can get a sense of how games map on both of these dimensions via a visualization I have come to dub ‘the mountain’.

The geek rating is designed to capture a mix of being highly rated and popular. To achieve this, boardgamegeek used a form of Bayesian averaging, (roughly) starting every game off with a set number votes at a lower value which means that it takes a decent number of users rating the game to actually move the geek rating away from this baseline.

We can see this if we plot all games based on their average rating as well as their number ratings and size by the number of user ratings.

BGG hasn’t, to my knowledge, ever made it clear how precisely they compute the geek rating. I’ve seen other people try to reconstruct it, and the general consensus seems to be that they add about 1500 5.5 dummy votes to every game.

## `mutate_if()` ignored the following grouping variables:
## Column `.metric`
## # A tibble: 3 × 5
## # Groups:   .metric [3]
##   dummy_ratings dummy_average .metric .estimator .estimate
##           <dbl>         <dbl> <chr>   <chr>          <dbl>
## 1          1900           5.5 mae     standard       0.011
## 2          1950           5.5 rmse    standard       0.02 
## 3          1850           5.5 rsq     standard       0.997

We actually get the closest with roughly around 1900 votes at 5.5. What are our misses when we use this as our formula? I’ve seen some people specualte it’s a function of the standard deviation.

It doesn’t look to be case, but it is interesting looking at the games where the actual geek rating is super different than my estimated version. Alien: USCSS Nostromo, The Fantasy Trip: Legacy Edition and the Binding of Isaac have really big differences, and if we look on BGG they all seem to either have a disproportionate amount of 1s or 10s. I wonder if in the Geek rating they filter out ratings from accounts that are flagged as just spamming ratings?

I’ve also seen people wonder whether they add more ratings to older/newer games. But I haven’t seen any real evidence of a difference by the yearpublished.

At any rate, we should be good to go by using roughly 1800 dummy votes at 5.5.

2 Adjusting for Complexity

The geek ratings on BGG are highly influential, but they skew heavily towards very complex games. This means that “normal people” will have a hard time using the games that are highly rated on BGG.

If we look at the top 100 games on BGG according to the geek ratings, we can see that all generally fall on the heavier side in terms of game complexity.

We want a list of games that isn’t so heavily skewed towards complexity. We want to “control for” the influence of complexity on the rating. That is, if we take the variation in the average ratings that isn’t explained by complexity, which games still have high ratings?

2.1 Fitting a Simple Regression

To do this, we need the residuals from a regression of average rating on complexity - these will be the variation in game ratings that are not explained by complexity. We will fit the bivariate model on games published through 2021 and inspect the results.

The intercept indicates the average rating of a game with a complexity of 0, which is kind of nonsensical. The coefficient indicates the effect of a unit increase in the complexity of a game on the average rating. Putting these two together tells us that a game with a complexity rating of 1 would have a rating of 5.8. A game with a complexity rating of 5, meanwhile, would have a rating of 8.14.

The model indicates that the complexity of a game explains about 26% of the variation in average ratings (R-squared, which simply is the correlation coefficient (R) we saw earlier squared). So it’s not the only thing that matters, but it has a pretty sizeable impact on the average rating and the corresponding geek rating.

2.2 Examining the Residuals

We don’t really care about the model per se, we just want to get the residuals.

The point of the residuals is that they are the variation in a game’s average rating that is not explained by complexity, meaning we will see no correlation between complexity and these residuals.

A positive residual in this case is a game that has a higher than expected average given its complexity. But, we can’t just adjust the ratings alone because it skews heavily towards games that have a highly inflated average rating due to a only having a handful of users.

2.3 Computing Complexity-Adjusted Ratings

We will adjust this using an approach similar to their Bayesian averaging methodology, adding 1800 ratings at the average of 5.5

2.4 Examining the Top Complexity-Adjusted Games

We can put this all together to now look at games that are rated highly after adjusting for the effect of complexity. We’ll look at the top 250 to keep it simple.

This gets us a list of games that is wildly different than before, and it’s a list of games that are much more palatable to “normal people”. In my mind, Crokinole taking the top spot makes a ton of sense and I will die on this hill.

2.4.1 Interactive Table for Complexity Adjusted Ratings

I’ll make an interactive table here so it’s easy to search for games and compare how games stack up on the different rankings.

2.4.2 Which Games Move Down the Most?

Which games have moved up and down the most? We can look at the difference between the BGG Rank and the Complexity adjusted ranks.

We want to focus in on the games that see a positive and negative shift in particular. Let’s look at the games that are penalized the most.

This list makes a lot of sense to me - the ratings themselves are still pretty good for some of these games, but these are all very, very heavy games. A lot of Vital Lacerda showing up in here, which to me epitomizes the disconnect between Geek Ratings and ‘ratings for people who fun games’. The former is heavily slanted towards people who relish complexity, whereas the latter is slanted towards games that provide heavy bang for their buck in terms of their weight.

2.4.3 Which Games Move Up the Most?

The list of games that go up the most is on the other hand very light, party games. Even with the boost from being simple, many of these games still aren’t rated that highly, though some notable ones (Monikers, MicroMacro, KLASK, Just One) end up near the top of the overall list.

2.4.4 Movement Within the BGG Top 100

Let’s restrict to games inside the BGG Top 100 and see how these games are affected.

It’s important to remember that this is meant to be a ranking list for someone who sees complexity as a negative moreso than a positive. Terra Mystica and Gaia Project might be great games for someone like me who is into the hobby, but would it rank highly for someone who isn’t that keen on games? Probably not.

3 Adjusting for “The Hotness”

The list of complexity adjusted games is pretty great for recommending games to a beginner, but it’s not as helpful for people who have been in the hobby longer. If you do like more complex games, what list should you look at? The Geek list as it currently stands is still probably good, but its not without its problems.

3.1 Why is the BGG Top 100 full of recently released games?

The Geek Ratings are heavily skewed towards games that have been released in recent years: all of the top 10 games were released after 2015, and only 5 of the top 50 were released prior to 2010. Either we truly seeing the pinnacle of game design in the last 10 years (this is possible), or it means that the BGG users have a tendency to seek out and rate the hotness, which skews the list towards recent games.

Why do newer games so quickly climb the geek list? I would argue that this due to the fact that the geek rating doesn’t use enough votes to start its Bayesian average. 1800 votes or so was probably a good prior for when board games had a much smaller audience, but as BGG and the hobby has grown, newer games rapidly attract enough user ratings that they manage to quickly overtake this prior.

If we plot the average and median user ratings for games published in each year, we can see how more recently published games draw lots of user ratings (though this does start to taper off for games in the last 1-2 years, though we would expect this to go up as these games accumulate user ratings).

This isn’t a problem per se, but it does mean that two games might actually be pretty similar in quality, but the one released more recently draws lots of user ratings, and so it gets a big boost to its rating on the Geek list.

3.1.1 Is the Hotness a Problem?

I’m not going to argue that this is a fundamental flaw in the rating that the site uses. But I do think it has changed the meaning of the rating over time. Presumably, the idea behind using the Bayesian average is to indicate games that combine both popularity and a high rating. On some level, we’d like the ratings to be that intersection of ‘these are games that have been well received and are very popular’.

If we back to that plot of game averages plotted against logged user ratings, this basically means finding games that are at the top of the ‘mountain’. But I’ve highlighted the games in the top 100 to show that this isn’t quite the case with the geek rating. Popularity ends up mattering less than we might want it to, given the relationship between the two components.

3.1.2 Amending the Formula

Fortunately for us, we can adjust for this in a pretty simple way by upping the number of baseline 5.5 votes for every single game. Rather than using the 1800 or so that BGG uses, we’ll toggle the number of baseline ratings at various thresholds up to 100,000 votes and see how the rankings start to change. Games that maintain a high ranking at each threshold will be games that have both a large number of user ratings and a high average. Games that are sensitive to the number of baseline user ratings will be ones that are ‘inflated’ in the current ranking system.

What do we find? We can make a table of games with their rankings at different thresholds. There’s no getting around the fact that Gloomhaven is going to be a top game, it has an extremely high average with a lot of votes. The same goes for Terraforming Mars, which has an extremely high number of user ratings with a high average - if we set the number of baseline votes over 25k Terraforming Mars becomes the top game of all time.

There’s no correct number of votes to add, but it is interesting to see how games are affected by having more votes. If we add 100k votes we basically end up with a list that penalizes recent games to enter the top 100 (Gloomhaven Jaws of the Lion, Marvel Champions, Rising Sun, Underwater Cities, Dune Imperium) and provides a boost to some of the pillars of the board game renaissance Pandemic, Ticket to Ride, Pandemic, 7 Wonders, Puerto Rico, Agricola. Man, evidently Terraforming Mars is really good?

3.2 Top Board Games with 100k Votes Added

What are the top games if use a prior with 100k votes rater than 1k?

Adding 100k votes gives us the list of games that more or less maps to the evergreen games of the last two decades, and actually shows a decent balance of complex and simple games. I would argue that is is a stronger list to look at for people who are interested in the hobby but aren’t necessarily going to be interested in chasing the hotness.

Going back to ‘the mountain’ visual, what games does this set of rankings end up prioritizing?

Hooray! That’s about what I was hoping to see.

3.3 Interactive Table

I’ll now put this same information into a datatable to make it a bit more interactive, you can sort and search for any game and see how it compares across the different thresholds.